ocr: t+1 1+2 au 8t+1 8t+2 St+ L St+2 Figure 6: The program's. expertence consists ofa trajectory through state space. Al - time stept, the state iS S, and tbe agent faces a choice ofactions. Note tbe action the agent cbooses to execuse at. stept isa. The rewardat stept, Reward,, isafunction ofst anda, Ihe next state Si+1 depends ons ar and mary random events such as passengers arriving atfloors anapushing buttons. Reinforcement learning allous 3 program to se such - a trajectoryto incrementally improve its policy.